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I.  Introduction 


The  issue  of  estimating  the  parameters  of  non-Gaussian  autoregressive  (AR) 
processes  has  received  significant  consideration  in  recent  years  [Martin  1982) ,  [Kay 
and  Sengupta  1986,  A).  These  processes  are  modeled  by  an  all-pole  filter  excited  by 
zero-mean  white  noise,  also  called  the  driving  noise,  which  may  be  non-Gaussian. 
In  many  applications,  however,  the  driving  noise  can  not  be  assumed  to  have  zero 
mean.  As  an  example,  it  may  be  necessary  to  process  a  deterministic  signal  con¬ 
taminated  by  non-Gaussian  noise  after  both  the  signal  and  the  noise  have  passed 
through  a  channel.  A  method  to  estimate  the  parameters  of  a  non-Gaussian  non¬ 
zero  mean  AR  process  may  be  an  useful  tool  for  such  processing.  Another  problem 
of  interest  is  the  estimation  of  the  parameters  of  a  zero-mean  AR  process  when  a 
combination  of  deterministic  signals  of  unknown  amplitude  is  added  to  it  before  it 
is  observed.  This  paper  suggests  a  generalization  of  an  estimation  technique  pro¬ 
posed  earlier  [Kay  and  Sengupta  1987,  Lee  1987]  in  order  to  make  it  applicable 
to  both  the  non-zero  mean  processes  described  above.  Essentially  it  is  a  two  stage 
procedure  based  on  an  approximation  of  the  maximum  likelihood  estimator  (MLE). 
The  resulting  estimator  is  asymptotically  efficient  in  the  sense  that  its  covariance 
matrix  approaches  the  Fisher  information  matrix  for  large  data  records. 

The  paper  is  organized  as  follows.  The  Cramer-Rao  bound  for  the  parameters  of 
each  model  is  presented  in  Section  II.  Section  HI  develops  the  two-stage  least  squares 
(LS)  estimator  for  these  parameters.  In  Section  IV  this  method  is  applied  to  the 
problem  of  detecting  a  signal  known  except  for  amplitude  in  the  presence  of  colored 
non-Gaussian  noise  using  a  generalized  likelihood  ratio  test  (GLRT).  Although  the 
GLRT  is  known  to  have  nice  asymptotic  properties,  it  could  not  be  used  for  this 
problem  before  due  to  the  unavailability  of  a  reasonably  good  estimation  technique. 
Section  V  presents  the  results  of  computer  simulation  of  the  performance  of  the  two- 
stage  LS  estimator.  The  performance  of  the  GLRT  detector  for  a  typical  detection 
problem  is  also  simulated.  Section  VI  summarizes  the  results. 


II.  Cramer-Rao  Bounds 

Consider  N  observations  of  the  following  non-zero  mean  AR(p)  processes 

p 

zn  =  £ayxn_y  4*  sn  +  un  (l) 

j  =  i 
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p 

Xr,  -  HTSn  -  ^a;(i n-:  ~  HTS n_;)  -  un  (2) 

;'=i 

where  the  driving  noise  un  has  a  zero- mean  PDF  f{un)  with  variance  a2  and  {sn}  = 
{\/nl)  Sn'1  ■  •  •  5^]t}  is  a  known  sequence  of  vectors  of  order  q.  uT  =  [u\  1/2  •  •  •  i/q\ 
and  nT  =  [/xi  ^2  •  •  •  ]  are  the  vectors  of  coefficients  of  sn  in  models  (l)  and  (2), 

respectively.  /  is  assumed  to  be  a  symmetric  function  of  un.  It  is  also  assumed 
that  /  has  tails  heavier  than  a  Gaussian  PDF  having  equal  variance.  This  is  char¬ 
acterized  by  the  presence  of  impulses  in  the  driving  noise  time  series,  (l)  represents 
an  AR  process  where  a  deterministic  mean  is  added  to  the  otherwise  zero-mean 
driving  noise  (see  Figure  1(a)).  This  model  is  also  useful  in  some  system  identifi¬ 
cation  problems  [Eykhoff  1974].  (2)  represents  a  zero-mean  AR  processes  to  which 
a  deterministic  signal  has  been  added  before  it  is  observed  (see  Figure  1(b)).  It  is 
easy  to  observe  that  if  is  a  geometric  sequence 


(3) 


for  t  =  1,  2,  •  •  • ,  <7,  then  (2)  takes  the  form  of  (1)  with 


vi  =  ^  1 1  +  J  =  wEa)r.  ;> 

;=i  /  3=o 


i'=l,2,- 


(4) 


where  ao  is  defined  to  be  unity.  This  is  not  surprising,  since  (3)  is  the  well-known 
eigenfunction  of  a  linear  shift  invariant  (LSI)  filter.  It  can  be  shown  that  (3)  is 
the  only  possible  form  of  sit'1  for  which  there  is  direct  correspondence  between  the 
elements  of  v  and  /i.  This  special  case  includes  real  and  complex  exponentials,  dc 
signals[Anderson  1971]  and  alternating  signals.  It  can  be  generalized  to  the  case  of 
pairwise  correspondence  between  the  elements  of  u  and  n  as  follows.  Consider  for 
<7  =  2 

sj,1^  =  rn  cos  (nw) 

=  r"  sin  (nw) 

It  follows  that  (2)  is  equivalent  to  (l)  with 


(5) 


(  V\  \  =  f£>=or  ;cos(jw)  -£y=or  ;  sin(jx;) 

V  )  V  £y=or_J  sin(jw)  E;=or_;  cos(;w) 


Mi 

M2 


(6) 


This  special  case  is  useful  in  representing  damped  or  undamped  sinusoids  with 
arbitrary  phase.  In  general,  sn  may  be  composed  of  a  combination  of  functions  of 


3 


the  form  (3)  and  (5)  in  order  to  make  (2)  equivalent  to  (1).  The  importance  of  this 
equivalence  will  be  apparent  later. 

The  joint  PDF  of  process  (1)  is  given  by 

fx(x)  =  fe(x)fp(x)  (7) 


where 


N 


*c(x)  =  n  /  ~  ^Sn 

n=p-t-l  \>=0 


fp(x)  ~  xj  •••  rp  (^1  i  '  ’  '  )  xp) 

The  information  matrix  for  the  parameter  vector  #  can  be  shown  to  be 

d 2  In  fc  (x) 


/*  =  -E 


d*d*2 


-  E 


d2  lnfp(x) 
d*d*T 


N 


-  L  * 

n=p+l 


\d 2  In /(«*)] 

-  E 

d2  lnfp(x) 

d*d*T 

d*d$T 

(8) 


The  second  term  accounts  for  the  information  from  the  first  p  samples  while  the 
first  term  corresponds  to  the  information  from  all  subsequent  samples  up  to  N.  For 
large  sample  size  the  second  term  may  be  neglected.  Defining  the  parameter  vector 
as 


*  =  [z/r  aT  a2]r 


where  a  =  [ax  •  •  •  ap )r,  it  is  shown  in  the  Appendix  that 


(9) 
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f  IfSTS 
-If  MTS 
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where 

■  g  2 

If  =  E  —In /(u) 

'”  =  4[£[4ln/(u)f~1 

S  =  ■  s^j 

s(‘)  -  [.(*)  .(*)  ...  s(*)lr  -  _1  2  ...  a 

s  “  [5p+l  5p+2  sN  1  —  L1Z1  »9 

mp  rnp_i  ■■■  m\  \ 

mp+i  mp  m2 

mjv-i  rnjv-2  •••  ^N-p' 

C  is  the  p  x  p  covariance  matrix  of  the  time  series  (1)  and  the  sequence  {mi, m2, 
■  •  •  ,mjv}  is  the  mean  of  the  time  series  (l).  They  satisfy  the  recursion 

p 

mn  =  +  uTsn  (11) 

;'= 1 

which  is  the  output  of  the  filter  excited  by  the  signal  The  Cramer-Rao  lower 

bounds  for  the  parameters  are  obtained  from  the  inverse  of  (10)  (see  Appendix). 

l  r  r 

Cov(u)  >  |SnT  [i  +  Mn[{N  -  p)Cn]-lMnTJ  Snj  (12a) 

Cov( a)  >  ~  [{N-p)Cn  +  Mnr  (12ft) 

Var^]  £  (n  -P)i;,  (12c) 

The  subscript  n  denotes  normalization  of  the  corresponding  quantities  by  a.  The 
matrix  inequalities  (12a)  and  (12ft)  indicate  that  the  difference  of  the  left  and  right 
hand  side  matrices  are  nonnegative  definite. 

Since  the  matrix  M n\(N  -  p)Cn]_1M„r  is  positive  definite,  the  right  hand 
side  of  (12a)  approaches  its  minimum  value  of  [a2J/SnTSn]_1  when  Mn  is  close 
to  zero.  This  requires  the  spectrums  of  the  signals  {sn^},  t  =  1,2,  •••,?  to  have 
a  mismatch  with  the  noise  power  spectral  density  (PSD).  The  lack  of  knowledge 
of  the  AR  filter  parameters  always  makes  it  more  difficult  to  estimate  v.  Note 
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that  the  special  case  of  singular  Sn  Sn  has  been  ignored.  This  can  happen  if  and 
only  if  the  columns  of  Sn  are  linearly  dependent  and  consequently  u  is  not  entirely 
observable.  A  suitable  reduction  in  the  size  of  u  is  required  to  avoid  this  situation. 
The  matrix  (I  -  Sn(S^Sn)-1S^)  on  the  right  hand  side  of  (12b)- is  idempotent. 
Therefore  it  is  nonnegative  definite  and  serves  to  reduce  the  CR  bound  by  using  the 
additional  information  about  the  AR  filter  parameters  carried  by  the  signal.  The 
task  of  designing  a  suitable  “probing”  signal  to  extract  a  large  amount  of  information 
about  a  is  a  problem  of  system  identification  [Eykhoff  1974j.  Finally  the  quantity 
a2 1/  is  known  to  achieve  its  minimum  value  of  unity  only  in  the  case  of  a  Gaussian 
PDF  [Sengupta  and  Kay  1986,  Aj.  Hence  u  and  a  may  be  estimated  more  precisely 
in  the  case  of  a  non-Gaussian  PDF  of  the  driving  noise  than  in  the  Gaussian  case. 
The  quantity  a2 1 /  depends  on  the  shape  of  the  PDF  and  is  a  quantitative  measure 
of  the  expected  improvement  over  the  Gaussian  case.  To  show  the  scale-invariance 
of  a2  If  define  g  to  be  the  PDF  of  the  normalized  random  variable  ti  =  u/c r, 

g(u)  =  trf(u) 


which  does  not  depend  on  the  variance  of  u.  Hence  o2 If  depends  only  on  the  shape 
of  the  PDF  and  is  unaffected  by  scaling. 

The  joint  PDF  of  process  (2)  also  has  the  form  (7)  where 

f'W  =  n  /  Ha;  (Zn-J  “  ) 

n=p+l  \j=0  J 
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The  approximate  information  matrix  (ignoring  the  contribution  from  the  first  p 
samples)  for  the  parameter  vector 


*={HT  aT  <72]t 


(13) 


was  derived  by  Martin  [1982]  for  the  special  case  q  =  1  and  =  1  V  n  and  by 
the  authors  [Kay  and  Sengupta  1986,  B]  for  the  case  q  —  1,  arbitrary.  The 
generalization  to  the  case  q  >  1  can  be  easily  shown  to  be 

( i,vTv  : 


0 


pxq 


(N  -  p)IfC 


V  of 


Op 


0, 


(N  -  p)U,  J 


where 


v  = 

V(1)  v(2> 

.  y(*)' 

v(0  = 

’„(*)  V{l)  ■ 

p+ 1  p+2 

••  V(,)lT 

VN 

1  =  1,2,-  • 

p 

=  5Za>Sn-;’  n  =  p  +  l,p+2,---JV 

;=0 


The  CR  bounds  are  immediately  obtained  as 


(14) 


Cov(ft)  > 
Cov( a)  > 
Var(a2)  > 


a2I, 


■(Vjv.) 


-1 


1-1 


(. N-p)o2If  n 

1 

{N  -  p)1 ,2 


(15a) 

(15  b) 

(15c) 


n  once  again  denotes  normalization.  The  lower  bound  on  Cov(ft)  is  small  if  Vn 
is  large.  This  can  be  accomplished  if  there  is  a  significant  mismatch  between  the 
spectrums  of  the  signal  components  {s^},  »  =  1,2, and  the  noise  power 
spectral  density  (PSD)  (see  the  definition  of  V  in  (14)).  Higher  signal  amplitudes 
do  not  help  reduce  the  CR  bound.  The  lower  bound  on  Cou(a)  is  independent  of 
the  shape  of  the  signal.  Both  Cov(ii)  and  Cov( a)  have  smaller  lower  bounds  in  the 
case  of  non-Gaussian  PDF  than  in  the  Gaussian  case  and  the  difference  is  given  by 
the  factor  a2//  as  before. 
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i 


It  is  clear  from  (126)  and  (156)  that  the  CR  bound  for  a  is  lower  in  the  case  of 
model  (1)  than  that  of  model  (2).  It  is  now  shown  that  if  (3)  holds  then  the  bound 
(126)  attains  its  maximum  value  given  by  (156).  Substitution  of  (3)  in  (ll)  yields 

p  9  p  /  9 

uiTnx  =  I  Mlrtn_; 

j—Q  i  =  l  >  =  0  \i=l 


i.e., 

9 

™n  = 

t=l 

Thus  the  Arth  column  of  M  can  be  written  as 

9 

which  belongs  to  the  column  space  of  S.  Hence 

MT[I-S(STS)_1ST]M  =  0 

and  consequently  (126)  and  (156)  are  equivalent.  This  result  has  nice  intuitive 
justification,  since  the  models  (l)  and  (2)  become  equivalent  when  (3)  holds. 


in.  The  MLE  and  its  Approximation 

From  (5)  the  log-likelihood  function  for  process  (l)  is  given  by 


N 


( 


lnfx(x)  =  ln/  HlaiXn-i  ~  pTsn  )  +  lnfp(x) 

n=p+l  \j—0 


The  second  term  is  ignored  for  the  purpose  of  simplicity  of  maximization  over  the 
parameters.  The  remaining  term  is  the  well-known  conditional  likelihood  function 
[Box  and  Jenkins  1970] .  The  MLE  of  #  is  approximately  a  solution  of 


N 


n=p+l 


Uy£n 


=  0 
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Carrying  out  the  differentiation  with  respect  to  r/,, 

n  d 

^  a^~lnjr*'Un^  ju™=EP=0a;I’*-->~l/T8'‘ 


n~  p4- 1 


=  V  .  ,io /'(“•) 

Jr+.  *  /(«*) 


AT 


=  53  5n)unr(un) 

n  =  p+l 

=  0 


u„  =V>  a,  i„-  .•  —  i/rs„ 

L — s  j  ~  0  J  J  n 

u’*  =  EJP=o‘1^I’*-J~,/T8'' 


N 


N 


*■«•»  53  "M  53  ^^(Un)  +  53fl>  (  ~  53  'Sn);Cn-j  T(un) 

fc  = 1  Vn=p+1  /  j=l  \  n=p+l  / 

N 

=  53  ^n^nf  (un)  *  =  1»2,,*',9 

n=p+l 


where 


r(u)  =  -  /iul 
1  '  «/(«) 


Differentiation  with  respect  to  a;  results 
AT 


n=p+ 1 


a 

day 


(16) 


(17) 


fc=i 


(«n) 

u»sB£>»0ai*— * 

AT 

~  ~  }  '  In- yunf(un) 

n=p+l 

=  0,  j  =  1,2,  *  •  • ,  p 

AT  N 

\  p  f  N 

53  sn)xn-jTiun) 

j  ■*"  ^aJ  I  In-iIn-yf'(Hn) 

n=p+ 1  / 

'  J  =  1  \n=p-f  1 

s 


—  53  InXr»-ir(un),  t  —  l,2,---,p 

n=p+l 

Differentiation  with  respect  to  a 2  requires  one  to  define  a  scaled  version  of  / 


(18) 


$(*)  =  <^/(crt) 


(19) 
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so  that,  g  has  unity  variance,  it  follows  that 


.v 


£  p'"/W| 

n  =  p+  1  lu-»  =  2£J  =  0aix'-i-1 


t.e., 


i  ‘v 

~s*£ 


n  =  p  +  1 
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“U(^) 


+  1 


V«=X^JP=0ayIn-J-*/T8" 


=  _J_  y  L  L 

2-2n=V+1ln/K) 


lu*=Ej,o  *>•*■*-> -* 


1 

2^2 


JV 


(w-p)-  5]  u’rM 

n=p+l 


-I  ,U"  =  Ey  =  o^ 


z*-3-i/Tsn 


=  0 


/v 


7^7  £ 

^n=p+l 


u»  =  E-  =  oai*->- 


=  1 


(20) 


For  most  PDF’s  r(un)  is  a  complicated  function  of  o2 .  If  all  the  other  parameters 
are  known,  a 2  can  be  evaluated  from  (20)  by  a  numerical  search.  In  the  case  of  a 
Gaussian  PDF  it  has  been  shown  that  T(u)  =  l/o2  V  u  [Kay  and  Sengupta  1986, 
Aj.  Therefore  (20)  reduces  to  the  familiar  form 

r  n  =  p4-  1  ^ 

which  is  just  the  averaged  sum  of  the  residuals.  In  this  case  (16)  and  (18)  reduce 
to  the  linear  equations 


£ 

j=o 


a,jXn_j  -  v - 


(21) 
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N 


E  S')s"k)  +  EaM  ~  E  *ln)xn-i 

k  —  l  Vn=p+ 1  /  j=  1  V  ri=p+l 


JV 


N 


E^(-  E  Sn)xn-j  )  +  E®>  E  X*-'Xn-) 

k—l  V  n=p+l  /  j=  1  \n  =  p+l  ; 


=  £  4°*. 

n=p+ 1 

i  =  1,2,  •  ,q 
N 

—  ~  E  xnxn—i") 

n=p  +  l 

*  =  1.2, 
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In  the  matrix  form 


... 

-s(DrXl  ••• 

-S(1)TXp  ^ 

/"I  \ 

s(?)  rg(<3) 

-st^Xj  ••• 

-s«)Txp 

U<t 

-xfs(15  ••• 

xfxi 

xfxp 

ai 

^  -x^V1)  ••• 

x£xt 

X^Xp  J 

V  ap  J 

(sU\o\ 

s^)TX0 

-xfxo 


(22) 


where 


V  -x;Txo  J 


xj  —  lzp-;+i  xP-y+2 


j  =  0,1,- 


This  is  a  generalization  [Anderson  1971]  of  the  covariance  method  of  linear  prediction 
which  includes  the  estimation  of  :/  along  with  that  of  a.  The  matrix  on  the  left 
hand  side  can  be  easily  shown  to  be  positive  definite  with  probability  1.  (22)  has 
a  unique  solution  which  is  obtained  in  0(p2)  operations.  The  estimates  of  u  and  a 
can  be  us-'1'4  fhe  MLE  of  cr2  from  (21). 

In  the  general  non-Gaussian  case  the  equations  (16)  and  (18)  are  highly  nonlin¬ 
ear,  since  un  is  a  linear  function  of  v  and  a  and  T  is  usually  a  complicated  function. 
The  form  of  these  equations  resemble  those  of  efficiency  robust  M-estimators  [Huber 
1981]  for  which  T  is  chosen  suitably  to  minimize  the  dependence  of  the  efficiency  of 
the  estimator  on  the  PDF  [Martin  1979],  [Lee  1987],  If,  however,  T  is  as  defined  in 
(17)  and  un  is  computed  using  some  fixed  and  approximate  value  of  v  and  a,  both 
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the  equations  become  linear.  In  the  matrix  form  they  can  be  written  as 


where  T  =  diag{Y[up+i),T{up+2),  •  •  •  ,T(ujv)}  and 


un  —  in  -t-  ^  u  s^,  n  —  p  1,  p  -t-  2,  •  •  • ,  N  (24) 

y=i 

which  uses  fixed  and  approximate  values  of  u  and  a  obtained  from  an  initial  stage  of 
unweighted  LS  given  by  (22).  (23)  can  be  interpreted  as  the  solution  to  the  weighted 
least  squares  (LS)  problem 

s  f  P  \2 

min  I  x«  +  5Z°yIn-y  "  *'r s n  r(un)  (25) 

1/’“  n=p+ 1  \  ]  =  l  J 

The  solution  to  (23)  is  expected  to  be  much  better  than  the  unweighted  LS  esti¬ 
mators,  such  as  the  covariance  estimator,  which  implicitly  assumes  the  underlying 
PDF  to  be  Gaussian.  X  is  a  symmetric  (p  +  l)  x  (p  +  1)  matrix  which  is  positive 
definite  with  probability  1.  To  show  this  write  X  as 

X  =  [s^  ■  •  •  S ^  Xi  X2  •  •  •  Xp]Tr[s(l)  s™  •  •  •  Xi  X2  •  •  •  Xp) 


2 


I 

r 

i 

! 

\ 

\ 

. 


I»  is  positive  for  all  the  symmetric  PDF’s  which  are  monotonically  nonincreasing 
functions  of  positive  values  of  u.  This  mild  condition  makes  sure  that  the  matrix  T 
has  positive  diagonal  elements.  If  b  =  j&o  &i  •  •  •  bp+q]T  is  a  vector  of  real  numbers 
then  brXb  is  the  squared  norm  of  the  random  vector 

r1/2[s{1)  s(2>  •••  Xi  x2  •••  Xpjb 


and  must  be  greater  than  zero  with  probability  1.  Since  X  is  positive  definite,  (25) 
has  a  unique  solution.  Further  simplification  can  be  made  by  approximating  T  by  a 
simpler  function  whenever  it  is  complicated  [Kay  and  Sengupta  1986,  A],  cr2  can  be 
estimated  from  (20)  after  u  and  a  are  determined.  Alternatively,  a  coarse  estimate 
of  a 2  can  be  obtained  by  substituting  u  and  a  in  (21).  The  suggested  method  is 
an  extension  of  the  two-stage  LS  estimator  proposed  in  an  earlier  paper  [Kay  and 
Sengupta  1987], 

The  log-likelihood  function  for  process  (2)  is  given  by 

In/ 

n=p+l 


lnfx(x)=  ]T 


^ay(xn_y  -  pT sn_j)  j  +  lnfp(x) 

>=0  J 


The  second  term  is  ignored,  as  before,  leaving  the  conditional  likelihood  function. 
The  MLE  of  is  approximately  a  solution  of 


d 

d9 


N 

E  'nf 

n=p+ 1 


=  0 


Differentiation  with  respect  to  a2  yields  an  equation  similar  to  (20) 

H  unr(«n) 


N -p 


n=p+ 1 


=  1 


(26) 


Differentiation  with  respect  to  p  and  a,  however,  does  not  give  rise  to  convenient 
forms  like  (16)  and  (18).  In  the  special  cases  of  (3)  and  (5)  this  process  reduces  to 
process  (1).  With  the  reparameterizations  (4)  and  (6)  one  can  proceed  to  estimate 
v  and  a  using  the  estimation  scheme  discussed  above,  p  can  then  be  computed 
from  the  inverse  mapping  of  (4)  and  (6).  Finally,  a2  is  obtained  by  substituting 
estimates  of  p  and  a  in  (26). 
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Thus  the  two-stage  LS  estimator  offers  an  elegant  method  of  estimating  the  AR 
filter  parameters  and  the  driving  noise  variance  along  with  the  signal  parameter  v 
for  model  (1).  Any  combination  of  signal  shapes  can  be  used  as  long  as  they  are 
linearly  independent.  In  the  case  of  model  (2)  the  applicability  of  the  proposed 
technique  is  restricted  to  special  signal  shapes.  This  underscores  the  significance  of 
the  shapes  described  by  (3)  and  (5),  which  include  finite  combination  of  sinusoids, 
exponentials  and  dc  signals.  The  following  section  illustrates  how  the  two-stage 
LS  estimator  can  be  used  for  the  detection  of  such  useful  signals  in  .nonwhite  non- 
Gaussian  noise. 

IV.  Application  to  Optimal  Detection 

Two  detection  problems  will  be  discussed  in  this  section.  The  first  problem  is 
to  detect  whether  a  signal  is  present  when  the  noise  as  well  as  the  signal,  if  any,  has 
passed  through  a  channel.  If  the  unknown  channel  is  modeled  by  an  AR  filter,  the 
detection  problem  can  be  formulated  as  a  test  of  the  following  hypotheses. 

p 

Mq  :  xn  =  —  'y  'ajXn—j  +  un 

3  =  1 
V 

Ui  :  xn  =  -Ya,xn.,  +  un  +  vTsn 

3  =  1 

(see  Figure  1(a)).  The  noise  un  is  assumed  to  have  an  even  PDF  known  except  for 
its  variance  o2 .  The  sequence  of  vector  signals  {sn}  is  assumed  to  be  known  except 
for  its  amplitude  u. 

The  second  problem  is  to  decide  whether  a  signal  is  present  in  the  presence  of 
colored  ambient  noise  [Kay  and  Sengupta  1986,  B|.  The  signal  shape  is  assumed  to 
be  known  at  the  point  of  observation  irrespective  of  the  PSD  of  the  noise.  This  is 
in  contrast  to  the  first  problem  where  the  signal  goes  through  the  same  unknown 
process  of  colorization  as  the  noise  before  it  is  observed.  If  the  noise  PSD  is  described 
by  an  AR  model,  the  second  problem  is  to  choose  between  the  hypotheses 


Y,a3X*-3 


+  un 


(see  Figure  1(b)).  The  same  assumptions  about  the  driving  noise  are  made.  The 
signal  is  assumed  to  be  known  except  for  its  amplitude  p.  Note  that  the  time  series 
equations  for  the  two  problems  under  Hi  are  identical  to  (1)  and  (2),  respectively. 

(27)  and  (28)  together  represent  a  large  class  of  problems.  The  unknown  set  of 
parameters  a  allows  for  the  unknown  correlation  pattern  of  the  noise.  The  PDF  / 
of  un  can  be  chosen  to  characterize  specific  problems  in  a  realistic  way.  Finally,  by 
allowing  the  amplitude  of  the  signal  to  be  unknown,  the  detector  is  expected  to  be 
tolerant  to  different  attenuations  and  phase  changes  of  different  components  of  the 
signal. 

The  above  two  problems  can  be  recast  as 


u0 :  eT  =  [oT  ef] 

X1:0T=  [eTq  9j)  9q*  0 


where  9t  —  [aT  cr2 ] T ,  9q  —  u  in  problem  (27)  and  9q  =  p  in  problem  (28).  In  either 
case  the  dimensions  of  9t  and  9q  are  t  =  p+  1  and  q,  respectively.  The  two  problems 
will  be  discussed  side  by  side  because  of  their  close  resemblence.  It  is  well  known 
that  there  is  no  uniformly  most  powerful  (UMP)  test  for  (29).  Yet  the  generalized 
likelihood  ratio  test  (GLRT)  is  widely  preferred  because  of  its  nice  asymptotic  (large 
sample  size)  properties  such  as  consistency,  unbiasedness  and  constant  false  alarm 
rate  (CFAR).  It  is  also  called  the  uniformly  most  powerful  invariant  (UMPI)  test 
since  it  exhibits  the  UMP  property  among  the  class  of  tests  which  are  invariant  to 
a  natural  set  of  transformations  [Lehmann  1959].  The  asymptotic  performance  of 
the  GLRT  becomes  equivalent  to  that  of  a  clairvoyant  GLRT  (i.e.,  the  test  with 
perfectly  known  values  of  9t )  under  the  condition  [Kendall  and  Stuart  1979] 


l9jSO'0t)  ~  E 


(  dlnfx(x;0„0t)\  (  dlnfx(x;0q,0t) 


V 


d9n 


vv 


d9t 


=  0 


(30) 


9q  =  0 


I £  gt(9q,9t)  is  identically  zero  for  problem  (28)  (see  (14)).  From  (10)  it  is  apparent 
that  (30)  holds  for  problem  (27)  as  well.  Hence  for  both  the  problems  the  GLRT 
can  be  said  to  be  asymptotically  optimal  in  the  sense  that  for  large  sample  size  the 
knowledge  of  a  and  a2,  the  so-cuiled  nuisance  parameters,  would  not  improve  the 
performance. 

The  GLRT  for  testing  (29)  is  to  decide  if 


c(iq,h) 

C(0,9t) 


(31) 
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for  some  threshold  7,  where  Z  is  the  likelihood  function 


£{0q,0t)  =fx(x;*„*t) 

l)t  is  the  MLE  of  0t  assuming  Mo  is  true  while  0q  and  0t  are  joint  MLE’s  of  0q  and 

A  2  £ 

0t  assuming  Mx  is  true.  9t  is  found  by  maximizing  Z(O,0t)  over  0t.  Similarly,  0q,  0t 
are  obtained  by  maximizing  £(0q,0t)  over  0q  and  0t. 

The  statistics  of  tG,  the  likelihood  ratio ,  are  difficult  to  obtain  in  general.  For 
large  data  records  (asymptotically)  it  may  be  shown  that  ?  In  Iq  is  distributed  in 
the  following  manner  (Kendall  and  Stuart  1979] . 

2ln£G  ~  X2  under  M0  (32a) 

21n£c  ~  X,2(7>-M  under  Mi  (326) 


Here  xq  represents  a  chi-square  distribution  with  q  degrees  of  freedom  and  x/2(<7,  A) 
represents  a  noncentral  chi-square  distribution  with  q  degrees  of  freedom  and  non¬ 
centrality  parameter  A.  Note  that  x/2(?»0)  =  X2  or  the  distribution  under  Mo  is  a 
special  case  of  the  distribution  under  Mi  and  occurs  when  A  =  0.  If  (30)  holds,  the 
noncentrality  parameter  A,  which  is  a  measure  of  the  discrimination  between  the 
two  hypotheses,  is  given  by 


A  =  (0, *,)]«,  =  »t,e 


dlnfx(x;  0q,Ot  \  ( dlnfx(x;  0q,0t 


d0a 


d0a 


0q  = 


(33) 


where  0q ,  0X  are  the  true  values.  The  probability  of  deciding  Mx  when  Mo  is  true 
(also  called  the  probability  of  false  alarm)  is  given  by 


PFA  =  Pr{21n£G>y|K0} 


(34a) 


where  7'  =  2  In  7.  The  probability  of  correctly  deciding  Mx  (called  the  power  of  the 
test)  is 

PD=Pr{2\nlG>1'\Mi}  (346) 


In  practice,  7'  can  be  set  to  produce  a  given  false  alarm  rate  and  Pp  can  be 
calculated  from  (346)  accordingly. 


The  likelihood  ratio  for  problems  (27)  and  (28)  have  the  common  form  (see 


^  _  fxix;07,0t)  _  fe(x;  Bq,  &t)  fp(x;  9g,  $t) 

fx(x;0  ,h)  fc(x;0, 9t)  fp(x;Q,Ot) 

The  second  factor  is  dropped  for  ease  of  computation.  A  heuristic  justification  for 
ignoring  the  second  term  is  that  its  contribution  to  tG  will  be  negligible  when  N  is 
large  and  the  two  hypotheses  are  close  to  each  other.  With  this  simplification,  the 
test  is  equivalent  to  deciding  Hi  if 

2ln£G  =  lnfc(x;  0q,  &t)  -  lnfc(x;O,0t)  >  V  (35) 

Specifically  for  problem  (27), 

„  V''  ,  v^i  ?r  2  A 


2  In lG  =  Yi  In/  -  i>  sn;o" 

r»=p+ 1  Vy=o 


E  In/ 

n=p+l  \j=0 


(36a) 


and  for  problem  (28), 


n=p+l  \}= 0 


2\nlG  =  ln/(  Eay(xn_y-Ar8n_y):r 


n=p+l  \;'=0 


-  E  1  n/(EaJx"->;&3 


Hats  and  double  hats  once  again  indicate  MLE’s  under  Hq  and  Hi,  respectively.  a0 
and  ao  are  both  defined  to  be  unity.  Figures  2(a)  and  2(b)  are  block  diagrams  of 
(36a)  and  (366),  respectively.  In  the  Gaussian  case  In/  is  a  simple  quadratic  and 
the  above  test  Statistics  reduces  to 


2  ln£G  =  {N  -  p)  In  (^Ty 


(37a) 


i  N  p 

*2  =  TfZZ  E  in*’*'-’ 


N  -  p 

yn=p+ I  W=0 


N  -p  ^  ^ 

yn=p+ 1  \;=0 


1  N  P  r 

=  vcr;  E  E^1-*  -  *  *« 
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for  problem  (27)  and 


(376) 


2  \nta  =  (iV-pJln  _ 


~  E  ( 


n=p4-l  \  ;=0 


n=p+ 1  W=0 


=  77^7  E  f  -  ^Tg»-i) 


for  problem  (28). 

In  order  to  compute  the  probabilities  of  false  alarm  and  detection  from  (34),  it 
is  necessary  to  evaluate  the  noncentrality  parameter.  For  problem  (27)  it  is  obtained 
from  (33)  and  (10)  as 


A  =  i/TSTSuIf  =  — tSq  s0[a2If]  (38) 

ai 

where  s0  =  Su  is  the  vector  of  signal  sequence  including  the  amplitude.  The 
noncentrality  parameter  for  problem  (28)  is  obtained  from  (33)  and  (14). 

A  =  nT\TVnIj  as  —jl*T (AS) T (AS)/j(ff2 If] 


where  A  is  the  (N  -  p)  x  (N  -  p)  Toeplitz  matrix 


Therefore 

A  =  \nTST(AT A)Sn[a2 If\  =  Sq  R-1s0[<72//J  (39) 

a 1 

where  s0  =  S (i  and  R  =  a2(ArA)_1  is  approximately  the  (N  -  p)  x  (N  -  p) 
covariance  matrix  of  the  noise.  Sq  R-1s0  is  the  signal  to  noise  ratio  (SNR)  at  the 


output  of  a  prewhitener  built  with  perfect  knowledge  of  the  filter  parameter  a.  To 
be  more  precise,  if  the  data  is  passed  through  an  ideal  whitener  (a  filter  which  will 
completely  whiten  the  noise),  then  s^R-1so  is  the  ratio  of  the  energy  contributions 
from  the  signal  and  noise  parts  of  the  whitener  output.  The  same  interpretation 
holds  for  SqSo/o2  in  (38)  for  the  problem  (27).  In  either  case  A  is  proportional  to 
the  SNR  at  the  output  of  a  perfect  prewhitener  and  correlator.  From  (39)  it  was 
shown  earlier  [Kay  and  Sengupta  1986,  Bj  that  the  probability  of  detection,  which 
is  a  monotonically  increasing  function  of  A,  can  be  improved  in  the  case  of  problem 
(28)  by  choosing  the  signal  to  be  one  sinusoid  along  the  direction  of  the  weakest 
noise  eigenvector,  if  any  such  knowledge  is  available  at  all.  On  the  other  hand  the 
performance  would  deteriorate  if  the  spectrum  of  the  signal  (so)  matches  the  noise 
PSD.  From  (38)  it  is  clear,  however,  that  problem  (27)  offers  io  such  flexibility. 
This  makes  intuitive  sense,  since  both  the  signal  and  the  noise  pass  through  the 
same  filter.  The  factor  cr2//  indicates  the  expected  amount  of  reduction  in  the  SNR 
required  to  achieve  a  given  probability  of  detection  for  a  non-Gaussian  noise  PDF 
over  a  Gaussian  PDF. 

In  order  to  implement  the  GLRT  detectors  (36a)  and  (366)  it  is  necessary  to 
compute  the  MLEs  of  the  unknown  parameters  under  each  hypothesis.  Alterna¬ 
tively,  the  two-stage  LS  estimator  described  in  the  previous  section  could  be  used 
as  an  approximation  to  the  MLEs.  The  simpler  version  of  the  estimator  for  esti¬ 
mation  under  Mo  was  presented  in  an  earlier  paper  [Kay  and  Sengupta  1987].  The 
extension  provided  in  this  paper  allows  one  to  choose  any  signal  shape  for  (27), 
while  only  sinusoids,  exponentials  or  dc  signals,  which  satisfy  (3)  or  (5),  can  be 
used  in  the  case  of  problem  (28).  Computationally  simpler  approximations  of  the 
GLRT  such  as  the  Rao  efficient  score  test  [Rao  1973]  can  be  used  for  more  general 
signal  shapes  in  (28).  Under  the  asssumption  of  large  sample  size  and  weak  signal 
the  Rao  statistic  Ir  is  equivalent  to  2  In  tc-  When  (30)  holds,  the  Rao  statistic  for 
testing  (29)  is 

i^M,)[Aln£(o,S,) 

The  Rao  statistic  uses  MLEs  under  the  null  hypothesis  only.  In  the  case  of  problem 
(27)  £/?  can  be  shown  to  be 

tR  =  j- hrS  [STS]~1Srh  (40) 
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where 


k  —  i/#(**p+i)//(“p+i)  /,(up+2)//(“p+2)  //(^iv)//(uiv)]'r 

p 

un  =  5>n-„  n  =  p+l,p+2,-*-,JV 

j=0 

In  the  case  of  problem  (28)  reduces  to 

=  j^hrV  [VrV]_1Vrh  (41) 

with  h  and  un  as  defined  above.  In  general  V  will  be  a  function  of  a.  In  the  special 
case  of  (3)  it  is  observed  that  V  =  SD  with 

(p  p  p  \ 

XX  rrJ  >  X)  a>r2~ J  -  •  •  • » X^  a>r-T 

J=o  j=o  ;=o  j 

Consequently  (41)  is  equivalent  to  (40).  The  equivalence  holds  by  a  similar  argu¬ 
ment  in  the  special  case  of  (5)  and  in  general  for  all  signals  which  are  composed  of 
exponentials,  damped  and  undamped  sinusoids  and  dc.  The  two-stage  LS  estima¬ 
tor  of  a  in  the  zero-mean  case  [Kay  and  Sengupta  1987]  may  be  used  in  (40)  and 
(41)  instead  of  its  MLE.  Exclusion  of  the  MLEs  under  makes  the  Rao  detector 
attractive  from  the  estimation  point  of  view.  However  its  performance  is  expected 
to  be  worse  than  that  of  the  GLRT  detector  for  short  sample  sizes  and  large  signal 
amplitudes,  since  the  former  involves  more  optimistic  approximations. 

V.  Simulation  of  Performance 

Two  non-zero  mean  AR(4)  processes  of  the  form  (2)  were  chosen  for  computer 
simulations.  The  AR  filter  parameters  for  the  processes  [Kay  and  Sengupta  1986, A] 
are  given  in  Table  A.  Process  I  is  broadband  while  process  II  is  narrowband.  A  single 
signal  component  ( q  =  l)  with  amplitude  /z  =  0.1  was  used.  The  known  part  of  the 
signal  was  chosen  to  be 

Sn  =  (-l)n  (42) 

for  n  =  1,2, This  is  of  the  form  (3)  and  therefore  the  process  can  be 
described  by  (1)  as  well.  The  equivalent  values  of  u  corresponding  to  the  processes 
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I  and  II  are  0.4592  and  1.1147,  respectively.  The  driving  noise  was  assumed  to  be 
mixed-Gaussian  with  PDF 

/(u)  =  (1  -  c)G'b(u)  +  eG/(u)  (43) 

where  Gg(u)  and  G[(u)  are  zero-mean  Gaussian  PDFs  with  variance  o\  and  a2, 
respectively.  The  mixture  parameter  e  is  a  number  between  0  and  1  and  determines 
the  composition  of  the  PDF.  With  the  assumption  a2  >>  aB,  (43)  represents  a 
nominally  Gaussian  background  distribution  Gg  contaminated  by  an  interfering 
component  G/  with  higher  variance.  This  model  can  represent  the  background 
noise  in  radar  and  sonar  communication  where  clutter  and  reverberation  gives  rise 
to  occasional  impulses  in  an  otherwise  Gaussian  ambient  noise.  The  variance  of  the 
overall  PDF  is 

a2  =  (1  -  e)e2B  +  ecrj  (44) 

Simulations  were  carried  out  with  a2B  =  1,  a2  =  100  and  e  =  0.1.  The  resulting 
variance  is  a2  =  10.9.  The  AR  process  was  generated  by  passing  a  white  mixed- 
Gaussian  process  through  a  filter,  allowing  sufficient  time  for  the  transients  to  decay. 
The  white  process  was  generated  by  randomly  selecting  from  two  mutually  inde¬ 
pendent  zero-mean  white  Gaussian  processes  with  PDFs  Gg  and  G/  on  the  basis 
of  a  series  of  Bernoulli  trials  with  probability  of  success  e.  Thus  a  random  variable 
could  be  expected  to  come  from  the  background  population  for  (1  —  e)  fraction  of 
the  time  and  from  the  contaminating  population  for  e  fraction  of  the  time.  aB  and 
a 2  were  assumed  to  be  known  while  c  was  assumed  unknown,  e  is  linearly  related 
to  a2  by  (44).  Estimating  €  is  therefore  equivalent  to  estimating  a2. 

Table  B  shows  the  sample  means  and  sample  variances  of  the  unweighted  LS 
estimators  of  a,  /z,  u  and  a2  obtained  by  solving  (22),  (4)  and  (21).  The  performance 
is  poor,  as  expected.  The  variance  of  the  LS  estimators  of  a,  /z  and  u  are  larger 
than  the  corresponding  CR  bound  by  roughly  a  factor  of  10.  This  confirms  the 
well-known  result  that  the  asymptotic  variance  of  these  LS  estimators  are  given  by 
the  “Gaussian”  CR  bound  irrespective  of  the  driving  noise  PDF  [Anderson  1971). 
The  “Gaussian”  CR  bound,  which  assumes  the  PDF  to  be  Gaussian,  is  described 
by  (156),  (15a)  and  (12a)  with  a2Ij  =  1  and  is  clearly  larger  than  the  true  CR 
bound  by  a  factor  of  a2 If.  This  factor,  which  is  approximately  10  for  the  selected 
PDF  parameters,  explains  the  discrepancy  between  the  last  two  columns  of  Table 
B.  The  results  are  based  on  500  experiments,  each  conducted  with  500  data  points. 
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Tabie  C  summarizes  the  performance  of  the  two-stage  LS  estimator  obtained 
by  solving  (23),  (4)  and  (26).  The  variances  of  the  estimators  of  a,  p  and  u  are  close 
to  the  CR  bound.  This  improvement  of  performance  is  in  accordance  with  similar 
results  reported  by  the  authors  in  the  zero-mean  case  [Kay  and  Sengupta,  1987]. 
The  variance  of  the  estimator  of  a 2  continues  to  be  much  larger  than  the  CR  bound. 
This  is  to  be  expected,  since  the  estimator  given  by  (21)  is  not  an  approximation 
of  the  MLE  in  any  sense,  the  property  of  asymptotic  efficiency  of  the  MLE  being 
inapplicable.  The  bias  of  all  the  estimators  in  Table  C  appear  to  be  significantly 
less  than  the  bias  of  their  counterparts  in  Table  B. 

The  same  AR(4)  processes  were  used  for  simulations  of  the  performance  of  the 
GLRT  and  Rao  detectors.  A  single  signal  component  of  the  form  of  (42)  was  chosen. 
p.  was  adjusted  to  yield  different  values  of  SNR  in  the  following  way.  The  SNR  as 
defined  in  the  previous  section  is 


N 

SNR  =  SqSo  =  !'2  ^  s2  =  (IV  -  p)i/2 

n=p- f  1 


Using  (4), 


SNR  = 


[N  -  4)M2 

1  —  0-\  +  02  —  fl3  4-  04 


Therefore 


M  = 


(l  —  di  +  0-2  —  03  +  a4)SNR 

N- 4 


(45) 


Thus  /x  was  calculated  for  a  given  process  such  as  to  produce  a  desired  SNR.  The 
number  of  data  points  ( N )  was  chosen  to  be  50.  A  probability  of  false  alarm 
PFA  =  0.01  was  used  to  evaluate  the  detection  performance.  The  value  of  7' 
necessary  for  this  is  6.635,  as  obtained  from  the  tables  of  central  x2  distribution 
with  one  degree  of  freedom.  The  asymptotic  p  formance  of  the  GLRT  or  Rao 
detector  is  described  by  (326)  and  (346).  The  noncentrality  parameter,  as  obtained 
from  (38)  or  (39)  is 

A  =  SNR[<72//] 


A  is  calculated  for  each  value  of  the  SNR  and  the  theoretical  or  asymptotic  perfor¬ 
mance  is  evaluated  accordingly. 
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Since  the  signal  has  only  one  component,  the  Rao  detector  described  by  (41), 
which  in  this  case  is  equivalent  to  (40),  simplifies  to  [Kay  and  Sengupta  1986,  Cj 


Lr  = 


N 

(  P  ^ 

* 

2 

£ 

y  "a-jSn-j 

unr(un) 

n=p+l 

j 

»j 

N 


(46) 


w2)  X 

n  =  p+l  \;=0 


where  the  dependence  of  If  on  a 2  is  noted.  Ir  was  computed  from  the  above 
expression  for  500  different  sets  of  data,  each  of  length  N  =  50,  for  a  given  SNR. 
The  theoretical  value  of  the  threshold  y  was  used.  The  number  of  times  £r  exceeded 
7',  divided  by  500  (the  number  of  experiments)  was  regarded  as  the  experimental 
value  of  the  probability  of  detection.  The  MLE  of  the  AR  filter  parameters  required 
by  (40)  under  Uq  were  replaced  by  the  corresponding  two-stage  LS  estimators. 

The  GLRT  detector  is  given  by  (366)  with  p  =  4,  q  =  1.  The  MLE’s  were  again 
replaced  by  the  two-stage  LS  estimators. 

Figure  3(a)  plots  the  probabilities  of  detection  of  the  Rao  detector  and  the 
GLRT  detector  along  with  the  theoretical  performance  vs.  SNR  for  noise  process  I. 
Figure  3(6)  plots  the  same  for  noise  process  II.  The  performance  of  the  GLRT  de¬ 
tector  is  very  close  to  the  theoretical  performance.  This  confirms  that  a  sample  size 
of  iV  =  50  is  sufficient  for  the  theoretical  predictions  of  performance  of  the  GLRT 
to  be  valid.  Replacement  of  the  MLEs  by  the  computationally  simple  two-stage 
LS  estimators  appears  to  be  a  reasonable  approximation.  The  performance  of  the 
Rao  detector  is  equivalent  to  that  of  the  GLRT  detector  for  low  SNR.  However  its 
performance  deteriorates  for  stronger  signals.  The  Rao  detector  essentially  approx¬ 
imates  a  difference  by  a  derivative  [Rao  1973].  A  large  sample  size  and  closeness  of 
the  two  hypothses  are  crucial  to  this  approximation.  The  failure  of  the  Rao  test  at 
moderate  SNR  is  an  evidence  of  its  inapplicability  for  short  sample  sizes. 


VI.  Conclusions 

Two  models  of  non-zero  mean  non-Gaussian  AR  processes  were  considered. 
The  Cramer-Rao  bounds  for  the  parameters  of  each  model  were  presented.  A 
mismatch  of  the  signal  spectrum  and  the  noise  PSD  favors  precise  estimation  of  the 
signal  parameters.  The  CR  bounds  for  the  estimators  of  the  AR  filter  parameters  do 
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not  depend  on  the  signal  in  the  case  of  model  (2).  Estimation  of  these  parameters 
in  model  (1)  may  be  at  least  as  precise,  while  a  properly  designed  signal  can  reduce 
the  CR  bounds.  In  each  case,  the  CR  bounds  for  thse  two  sets  of  parameters  are 
less  in  the  case  of  a  non-Gaussian  PDF  compared  to  a  Gaussian  PDF. 

A  two-stage  LS  estimator  was  proposed  for  the  estimation  of  the  signal  coeffi¬ 
cients  and  the  AR  filter  parameters.  This  is  an  extention  of  a  technique  proposed 
earlier  for  zero-mean  processes  and  is  derived  as  an  approximation  to  the  MLE. 
The  two-stage  LS  estimator  provides  a  closed  form  solution  to  a  set  of  approxi¬ 
mated  likelihood  equations.  This  technique  allows  any  signal  shape  for  model  (l), 
while  for  model  (2)  only  permissible  signal  components  are  damped  or  undamped 
sinusoids,  exponentials  and  dc. 

The  two-stage  LS  estimator  was  then  applied  to  the  GLRT  detection  of  signals 
in  colored  non-Gaussian  noise.  Two  mixing  models  were  considered  and  the  GLRT 
detector  was  derived  for  each  one  of  them.  The  Rao  detector  for  these  problems 
were  also  presented.  Computer  simulations  indicated  the  success  of  the  GLRT  and 
the  failure  of  the  Rao  detector  for  moderate  SNR  and  short  sample  sizes. 


APPENDIX 


Derivation  of  the  Cramer  Rao  Bounds 


The  first  step  to  determine  the  CR  bounds  is  the  evaluation  of  the  information 
matrix  for  the  parameter  vector  $  defined  by  (9)  for  process  (1).  One  can  proceed 
from  (8)  neglecting  the  second  term  on  the  right  hand  side. 


d 2  lnfc(x) 


dtd*T 


_  Y'  F  d2  \nf[un) 

JZr  1  L  d*d*T 


=  E  E 


T»=p+1 


d  In  /(un)  \  (  d  In  /(«„) 


-  E  * 


n=p+I 


(A.1) 
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where  g  is  the  PDF  of  the  driving  noise  samples  scaled,  by  a , 


g(t)  =  of  (at) 


(■ 


so  that  g(t)  always  has  unity  variance.  The  partial  derivatives  are  obtained  as 


di /, 
da} 
do2 


(■ 

(- 

(• 


Note  that 


Hence 


where 


=  (o2f'(un)  \  =  f'(un) 
J  V  ^/(«n)  /  /(«n) 


If  =  E 


h  ln/(“) 
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Furthermore 


Entries  of  the  information  matrix  can  be  readily  obtained  from  (,4.3)-(^4.5)  using 
(A.6)-(A.8)  and  a  few  other  trivial  identities  such  as  the  expectation  of  an  odd 
function  being  zero. 
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using  the  definition  S  =  is^1)  •  •  •  s^h 


/  ainfc(x)N\  (  ainfc(x) 

V  dUi 
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It  follows  that 


E 


^ainfc(x)  ^  ^ainfc(x)  ^  T 


where  M  =  [mi  m2  •  •  •  m 
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where  C  is  the  pxp  covariance  matrix  of  the  data.  Note  that  x„  —  mn  is  a  zero- 
mean  wide  sense  stationary  (WSS)  process  even  though  xn  has  a  time-varying  mean. 
From  the  above  equation  it  follows  that 
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where 


(10)  follows  directly  from  equations  (A.9)-(A.14).  The  Cramer  Rao  bounds  are 
obtained  from  the  inverse  of  (10).  (12c)  results  readily  from  the  block  diagonal 
property  of  the  matrix.  If  S  and  M  are  scaled  by  a  and  C  is  scaled  by  a2,  then 
a2 If  can  be  pulled  out  from  the  four  blocks  of  1$  corresponding  to  u  and  a.  One 
can  then  make  use  of  the  result 

(  A  B  \  _1  _  /  (A  -  BC-1D)-1  —(A  —  BC-1D)-1BC-1  \ 

\D  C  J  V-(C  -DA~1B)-1DA-1  (C-DA^B)"1  ) 

As  a  consequence,  the  lower  bound  on  Cov(i >)  is 

~4r  [snTS„  -  SnTM.n[{N  -  p)Cn  4-  MnTM„]-1MriTSnl 

O*  1  f  J 

=  [snT  [i  -  Mn[(AT  -  p)Cn  +  MnrMn]-1Mnr]  Sn] 

=  ^ Tf  [snT  [i  +  Mn[~MnTMn  +  (N-  p) Cn  +  M„rMn]-1MnT]  Sn] 

=  ^~  Snr  [l  +  Mn((iV-p)Cri]-1Mnr]~1Sn  (A. 15) 

using  the  identity 

[A  +  BCD)-1  =  A-1  -  A-1B[DA-1B  +  Cr^^DA-1, 

(A. 15)  is  identical  with  (12a).  Finally,  using  the  inversion  formula  for  block  matrices 
once  again,  the  lower  bound  on  Cov(a)  is 

[(^  -  p)c" +  M-rMn  -  M„Tsn(snTsn)-1snTMnl -1 

O  If  i.  J 

=  [(iV  -  p)Cn  +  Mnr  [i  -  Snr(SriTSn)-1Sn]  Mn]  _1  (A. 16) 
which  is  the  same  as  (126), 


28 


References 

1.  T.W.  Anderson,  The  Statistical  Analysis  of  Time  Series,  Chapter  5,  New  York: 
John  Wiley,  1971. 

2.  G.E.P.  Box  and  G.J.  Jenkins,  Time  Series  Analysis:  Forecasting  and  Control , 
Chapter  7,  San  Francisco:  Holden-Day,  1970. 

3.  P.  Eykhoff,  System  Identification,  Chapter  6,  New  York:  John  Wiley,  1974. 

4.  P.J.  Huber,  Robust  Statistics,  Chapter  4,  New  York:  John  Wiley,  1981. 

5.  S.M.  Kay  and  D.  Sengupta,  “Simple  and  Efficient  Estimation  of  Parameters  for 
Non-Gaussian  Autoregressive  Processes”,  submitted  for  review  to  IEEE  Trans, 
on  Acoustics,  Speech  and  Signal  Processing,  1986. 

6.  S.M.  Kay  and  D.  Sengupta,  “Detection  in  Incompletely  Characterized  Colored 
Non-Gaussian  Noise  via  Parametric  Modeling”,  submitted  for  review  to  IEEE 
Trans,  on  Acoustics,  Speech  and  Signal  Processing,  1986. 

7.  S.M.  Kay  and  D.  Sengupta,  “Statistically/Computationally  Efficient  Detec¬ 
tion  in  Incompletely  Characterized  Colored  Non-Gaussian  Noise  via  Paramet¬ 
ric  Modeling” ,  submitted  for  review  to  IEEE  Trans,  on  Acoustics,  Speech  and 
Signal  Processing,  1986. 

8.  S.M.  Kay  and  D.  Sengupta,  “Statistically/Computationally  Efficient  Estima¬ 
tion  of  Non-Gaussian  AR  Processes”,  presented  at  the  International  Conference 
on  ASSP,  Dallas,  TX,  April  6-9,  1987. 

9.  Sir  S.M.  Kendall  and  A.  Stuart,  The  Advanced  Theory  of  Statistics  Vol.  II, 
Chapters  18-19,  New  York:  Macmillan  Publishing,  1979. 

10.  C.H.  Lee,  “Robust  Linear  Prediction  for  Speech  Analysis”,  presented  at  the 
International  Conference  on  ASSP,  Dallas,  TX,  April  6-9,  1987. 

11.  E.L.  Lehmann,  Testing  Statistical  Hypotheses,  New  York:  John  Wiley,  1959. 

12.  R.D.  Martin,  “Robust  Estimation  for  Time  Series  Autoregressions”  in  Robust¬ 
ness  in  Statistics,  edited  by  R.L.  Launer  and  G.N.  Wilkinson,  New  York:  Aca¬ 
demic,  1979. 

13.  R.D.  Martin,  “The  Cramer-Rao  Bound  and  Robust  M-Estimates  for  Time 
Series  Autoregressions”,  Biometrica,  pp.  437-442,  Vol.  69,  No.  2,  1982. 

14.  C.R.  Rao,  Linear  Statistical  Inference  and  its  Applications,  Chapters  5-6,  New 
York:  John  Wiley,  1973. 


29 


15.  D.  Sengupta  and  S.M.  Kay,  ‘‘Efficient  Estimation  of  Parameters  for  Non- 
Gaussian  Autoregressive  Processes’’,  submitted  for  review  to  IEEE  Trans,  on 
Acoustics,  Speech  and  Signal  Processing,  1986. 


Table  A:  Parameters  of  the  AR  processes  used  for  simulation 


fl2 

03 

.. 

U4 

poles 

1.338 

-0.662 

0.240 

0.7  exp{±j27r(0.12)| 
0.7expj±;2*r(0.21)] 

3.809 

-2.654 

0.924 

0.98  exp[±j2jr(0.1l)J 

0.98  exp[±j2x(0.14)j 

Table  B:  Performance  of  the  unweighted  LS  Estimator,  N  =  500 


0.2400 

0.1000 

0.4592 
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10.9000 
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3504 

3377 

6621 

2425 

0997 

4578 


10.8280 


-2.7544 

3.7944 

-2.6394 

0.9174 

0.0999 

1.1092 

10.8245 


Sample 

Cramer-Rao 

Bias2 

variance 

boimd 

2.682  x  10"6 

1.9277  x  10~ 3 

2.0697  x  10“ 4 

8.862  x  10“7 

4.6654  x  HT3 

5.1217  x  10~4 

3.434  x  10~9 

4.8690  X  10~3 

5.1217  x  10~4 

6.081  x  10-6 

1.7834  x  10~3 

2.0697  x  10~4 

6.625  x  10“ 8 

1.1229  x  10-3 

1.1353  x  10"4 

1.899  x  10"6 

2.3824  x  10~2 

2.3959  x  10" 3 

5.186  x  10-3 

6.0815 

0.6323 

3.134  x  10_s 

3.1395  x  lO-4 

3.2114  x  10~5 

2.124  x  10-4 

1.6867  x  10“ 3 

1.5815  x  10-4 

2.232  x  10'4 

1.7395  x  lO-3 

1.5815  x  10-4 

4.391  x  10-6 

3.7182  x  lO"4 

3.2114  x  10"5 

1.323  x  10-8 

1.9070  x  lO- 4 

1.9266  x  10~6 

3.052  x  10"5 

2.3450  x  10~2 

2.3939  x  10*3 

5.703  x  10" 3 

6.0833 

0.6323 

Table  C:  Performance  of  the  Two-stage  LS  Estimator,  N  =  500 
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